110 research outputs found
The Use of Covariate Adjustment in Randomized Controlled Trials: An Overview
There has been a growing interest in covariate adjustment in the analysis of
randomized controlled trials in past years. For instance, the U.S. Food and
Drug Administration recently issued guidance that emphasizes the importance of
distinguishing between conditional and marginal treatment effects. Although
these effects coincide in linear models, this is not typically the case in
other settings, and this distinction is often overlooked in clinical trial
practice. Considering these developments, this paper provides a review of when
and how to utilize covariate adjustment to enhance precision in randomized
controlled trials. We describe the differences between conditional and marginal
estimands and stress the necessity of aligning statistical analysis methods
with the chosen estimand. Additionally, we highlight the potential misalignment
of current practices in estimating marginal treatment effects. Instead, we
advocate for the utilization of standardization, which can improve efficiency
by leveraging the information contained in baseline covariates while remaining
robust to model misspecification. Finally, we present practical considerations
that have arisen in our respective consultations to further clarify the
advantages and limitations of covariate adjustment
Proximal mediation analysis
A common concern when trying to draw causal inferences from observational
data is that the measured covariates are insufficiently rich to account for all
sources of confounding. In practice, many of the covariates may only be proxies
of the latent confounding mechanism. Recent work has shown that in certain
settings where the standard 'no unmeasured confounding' assumption fails, proxy
variables can be leveraged to identify causal effects. Results currently exist
for the total causal effect of an intervention, but little consideration has
been given to learning about the direct or indirect pathways of the effect
through a mediator variable. In this work, we describe three separate proximal
identification results for natural direct and indirect effects in the presence
of unmeasured confounding. We then develop a semiparametric framework for
inference on natural (in)direct effects, which leads us to locally efficient,
multiply robust estimators.Comment: 60 pages, 3 figure
Adjusting for time-varying confounders in survival analysis using structural nested cumulative survival time models.
Accounting for time-varying confounding when assessing the causal effects of time-varying exposures on survival time is challenging. Standard survival methods that incorporate time-varying confounders as covariates generally yield biased effect estimates. Estimators using weighting by inverse probability of exposure can be unstable when confounders are highly predictive of exposure or the exposure is continuous. Structural nested accelerated failure time models (AFTMs) require artificial recensoring, which can cause estimation difficulties. Here, we introduce the structural nested cumulative survival time model (SNCSTM). This model assumes that intervening to set exposure at time t to zero has an additive effect on the subsequent conditional hazard given exposure and confounder histories when all subsequent exposures have already been set to zero. We show how to fit it using standard software for generalized linear models and describe two more efficient, double robust, closed-form estimators. All three estimators avoid the artificial recensoring of AFTMs and the instability of estimators that use weighting by the inverse probability of exposure. We examine the performance of our estimators using a simulation study and illustrate their use on data from the UK Cystic Fibrosis Registry. The SNCSTM is compared with a recently proposed structural nested cumulative failure time model, and several advantages of the former are identified
Reconciling model-X and doubly robust approaches to conditional independence testing
Model-X approaches to testing conditional independence between a predictor
and an outcome variable given a vector of covariates usually assume exact
knowledge of the conditional distribution of the predictor given the
covariates. Nevertheless, model-X methodologies are often deployed with this
conditional distribution learned in sample. We investigate the consequences of
this choice through the lens of the distilled conditional randomization test
(dCRT). We find that Type-I error control is still possible, but only if the
mean of the outcome variable given the covariates is estimated well enough.
This demonstrates that the dCRT is doubly robust, and motivates a comparison to
the generalized covariance measure (GCM) test, another doubly robust
conditional independence test. We prove that these two tests are asymptotically
equivalent, and show that the GCM test is in fact optimal against (generalized)
partially linear alternatives by leveraging semiparametric efficiency theory.
In an extensive simulation study, we compare the dCRT to the GCM test. We find
that the GCM test and the dCRT are quite similar in terms of both Type-I error
and power, and that post-lasso based test statistics (as compared to lasso
based statistics) can dramatically improve Type-I error control for both
methods
Bespoke Instrumental Variables for Causal Inference
Many proposals for the identification of causal effects in the presence of
unmeasured confounding require an instrumental variable or negative control
that satisfies strong, untestable assumptions. In this paper, we will instead
show how one can identify causal effects for a point exposure by using a
measured confounder as a 'bespoke instrumental variable'. This strategy
requires an external reference population that does not have access to the
exposure, and a stability condition on the confounder outcome association
between reference and target populations. Building on recent identification
results of Richardson and Tchetgen Tchetgen (2021), we develop the
semiparametric efficiency theory for a general bespoke instrumental variable
model, and obtain a multiply robust locally efficient estimator of the average
treatment effect in the treated.Comment: 48 page
Augmented balancing weights as linear regression
We provide a novel characterization of augmented balancing weights, also
known as Automatic Debiased Machine Learning (AutoDML). These estimators
combine outcome modeling with balancing weights, which estimate inverse
propensity score weights directly. When the outcome and weighting models are
both linear in some (possibly infinite) basis, we show that the augmented
estimator is equivalent to a single linear model with coefficients that combine
the original outcome model coefficients and OLS; in many settings, the
augmented estimator collapses to OLS alone. We then extend these results to
specific choices of outcome and weighting models. We first show that the
combined estimator that uses (kernel) ridge regression for both outcome and
weighting models is equivalent to a single, undersmoothed (kernel) ridge
regression; this also holds when considering asymptotic rates. When the
weighting model is instead lasso regression, we give closed-form expressions
for special cases and demonstrate a ``double selection'' property. Finally, we
generalize these results to linear estimands via the Riesz representer. Our
framework ``opens the black box'' on these increasingly popular estimators and
provides important insights into estimation choices for augmented balancing
weights
Doubly robust tests of exposure effects under high-dimensional confounding.
After variable selection, standard inferential procedures for regression parameters may not be uniformly valid; there is no finite-sample size at which a standard test is guaranteed to approximately attain its nominal size. This problem is exacerbated in high-dimensional settings, where variable selection becomes unavoidable. This has prompted a flurry of activity in developing uniformly valid hypothesis tests for a low-dimensional regression parameter (eg, the causal effect of an exposure A on an outcome Y) in high-dimensional models. So far there has been limited focus on model misspecification, although this is inevitable in high-dimensional settings. We propose tests of the null that are uniformly valid under sparsity conditions weaker than those typically invoked in the literature, assuming working models for the exposure and outcome are both correctly specified. When one of the models is misspecified, by amending the procedure for estimating the nuisance parameters, our tests continue to be valid; hence, they are doubly robust. Our proposals are straightforward to implement using existing software for penalized maximum likelihood estimation and do not require sample splitting. We illustrate them in simulations and an analysis of data obtained from the Ghent University intensive care unit
- …